COX_HAZARDS
Overview
The COX_HAZARDS function fits a Cox Proportional Hazards regression model for survival analysis, enabling the analysis of time-to-event data while accounting for censored observations. Introduced by Sir David Cox in 1972, this semi-parametric model estimates the effect of covariates on the hazard rate without requiring specification of the baseline hazard function.
The Cox model assumes that covariates have a multiplicative effect on the hazard function, expressed as:
\lambda(t|X_i) = \lambda_0(t) \exp(X_i \cdot \beta)
where \lambda_0(t) is the baseline hazard at time t, X_i represents the covariate vector for subject i, and \beta are the regression coefficients. The model is called “proportional hazards” because the ratio of hazards between any two subjects is constant over time, depending only on their covariate values.
This implementation uses the PHReg class from statsmodels, which maximizes the partial likelihood using either the Breslow or Efron method to handle tied event times. The function returns regression coefficients, standard errors, z-statistics, p-values, confidence intervals, and hazard ratios (the exponentiated coefficients). Hazard ratios represent the multiplicative change in the hazard for a one-unit increase in a covariate.
Cox models are widely used in medical research, reliability engineering, and econometrics to model survival times, failure times, or duration until an event occurs. The model handles right-censored data, where some subjects have not experienced the event by the end of the observation period. For more details on the statistical theory, see the Wikipedia article on proportional hazards models and the statsmodels duration analysis documentation.
This example function is provided as-is without any representation of accuracy.
Excel Usage
=COX_HAZARDS(time, event, x, fit_intercept, alpha)
time(list[list], required): Column vector of non-negative time values representing time to event or censoring.event(list[list], required): Column vector of event indicators where 1 indicates event occurred and 0 indicates censored.x(list[list], required): Matrix of covariates (predictors) where each column is a different predictor.fit_intercept(bool, optional, default: false): Whether to fit an intercept term. Cox model typically does not include an intercept.alpha(float, optional, default: 0.05): Significance level for confidence intervals. Must be between 0 and 1.
Returns (list[list]): 2D list with Cox regression results, or error message string.
Examples
Example 1: Demo case 1
Inputs:
| time | event | x |
|---|---|---|
| 5 | 1 | 1 |
| 10 | 1 | 2 |
| 15 | 0 | 1.5 |
| 20 | 1 | 2.5 |
| 25 | 1 | 1.8 |
| 30 | 0 | 2.2 |
| 35 | 1 | 1.2 |
| 40 | 1 | 2.8 |
Excel formula:
=COX_HAZARDS({5;10;15;20;25;30;35;40}, {1;1;0;1;1;0;1;1}, {1;2;1.5;2.5;1.8;2.2;1.2;2.8})
Expected output:
| parameter | coefficient | std_error | z_statistic | p_value | ci_lower | ci_upper | hazard_ratio |
|---|---|---|---|---|---|---|---|
| x1 | -0.8215 | 0.7938 | -1.035 | 0.3008 | -2.377 | 0.7344 | 0.4398 |
| log_likelihood | -7.127 |
Example 2: Demo case 2
Inputs:
| time | event | x | alpha | |
|---|---|---|---|---|
| 6 | 1 | 1 | 50 | 0.1 |
| 13 | 1 | 2 | 45 | |
| 18 | 1 | 1.5 | 55 | |
| 23 | 0 | 2.5 | 48 | |
| 30 | 1 | 1.2 | 52 | |
| 35 | 1 | 2.2 | 46 | |
| 42 | 1 | 1.8 | 49 | |
| 50 | 0 | 2 | 51 |
Excel formula:
=COX_HAZARDS({6;13;18;23;30;35;42;50}, {1;1;1;0;1;1;1;0}, {1,50;2,45;1.5,55;2.5,48;1.2,52;2.2,46;1.8,49;2,51}, 0.1)
Expected output:
| parameter | coefficient | std_error | z_statistic | p_value | ci_lower | ci_upper | hazard_ratio |
|---|---|---|---|---|---|---|---|
| x1 | -4.023 | 2.082 | -1.932 | 0.05335 | -7.448 | -0.598 | 0.0179 |
| x2 | -0.4303 | 0.3201 | -1.344 | 0.1789 | -0.9567 | 0.09622 | 0.6503 |
| log_likelihood | -6.212 |
Example 3: Demo case 3
Inputs:
| time | event | x | fit_intercept | alpha |
|---|---|---|---|---|
| 8 | 1 | 1.2 | false | 0.05 |
| 12 | 0 | 1.8 | ||
| 16 | 1 | 1.5 | ||
| 22 | 1 | 2.1 | ||
| 28 | 1 | 1.6 | ||
| 32 | 1 | 2.3 | ||
| 38 | 0 | 1.4 | ||
| 45 | 1 | 2.5 | ||
| 50 | 1 | 1.9 | ||
| 55 | 1 | 2.7 |
Excel formula:
=COX_HAZARDS({8;12;16;22;28;32;38;45;50;55}, {1;0;1;1;1;1;0;1;1;1}, {1.2;1.8;1.5;2.1;1.6;2.3;1.4;2.5;1.9;2.7}, FALSE, 0.05)
Expected output:
| parameter | coefficient | std_error | z_statistic | p_value | ci_lower | ci_upper | hazard_ratio |
|---|---|---|---|---|---|---|---|
| x1 | -1.408 | 0.9598 | -1.467 | 0.1423 | -3.289 | 0.4729 | 0.2446 |
| log_likelihood | -10.32 |
Example 4: Demo case 4
Inputs:
| time | event | x |
|---|---|---|
| 3 | 1 | 0.8 |
| 7 | 1 | 1.2 |
| 11 | 1 | 1.5 |
| 15 | 1 | 0.9 |
| 19 | 0 | 1.1 |
| 24 | 1 | 1.4 |
| 29 | 1 | 1 |
| 33 | 0 | 1.3 |
| 38 | 1 | 1.6 |
Excel formula:
=COX_HAZARDS({3;7;11;15;19;24;29;33;38}, {1;1;1;1;0;1;1;0;1}, {0.8;1.2;1.5;0.9;1.1;1.4;1;1.3;1.6})
Expected output:
| parameter | coefficient | std_error | z_statistic | p_value | ci_lower | ci_upper | hazard_ratio |
|---|---|---|---|---|---|---|---|
| x1 | -2.313 | 1.871 | -1.236 | 0.2163 | -5.98 | 1.354 | 0.09896 |
| log_likelihood | -9.66 |
Python Code
import math
import pandas as pd
import numpy as np
from statsmodels.duration.hazard_regression import PHReg as statsmodels_phreg
def cox_hazards(time, event, x, fit_intercept=False, alpha=0.05):
"""
Fits a Cox Proportional Hazards regression model for survival data.
See: https://www.statsmodels.org/stable/generated/statsmodels.duration.hazard_regression.PHReg.html
This example function is provided as-is without any representation of accuracy.
Args:
time (list[list]): Column vector of non-negative time values representing time to event or censoring.
event (list[list]): Column vector of event indicators where 1 indicates event occurred and 0 indicates censored.
x (list[list]): Matrix of covariates (predictors) where each column is a different predictor.
fit_intercept (bool, optional): Whether to fit an intercept term. Cox model typically does not include an intercept. Default is False.
alpha (float, optional): Significance level for confidence intervals. Must be between 0 and 1. Default is 0.05.
Returns:
list[list]: 2D list with Cox regression results, or error message string.
"""
def to2d(x):
return [[x]] if not isinstance(x, list) else x
def validate_numeric_2d(arr, name, allow_negative=True):
# Validates a 2D list and converts to numpy array
arr_2d = to2d(arr)
if not isinstance(arr_2d, list):
return f"Invalid input: {name} must be a 2D list."
if not all(isinstance(row, list) for row in arr_2d):
return f"Invalid input: {name} must be a 2D list."
try:
flat = []
for row in arr_2d:
for val in row:
num = float(val)
if math.isnan(num) or math.isinf(num):
return f"Invalid input: {name} contains non-finite values."
if not allow_negative and num < 0:
return f"Invalid input: {name} must be non-negative."
flat.append(num)
arr_np = np.array(arr_2d, dtype=float)
return arr_np
except Exception:
return f"Invalid input: {name} must contain numeric values."
def validate_scalar_numeric(val, name, allow_negative=True):
# Validates a scalar numeric value
try:
num = float(val)
except Exception:
return f"Invalid input: {name} must be a number."
if math.isnan(num) or math.isinf(num):
return f"Invalid input: {name} must be finite."
if not allow_negative and num < 0:
return f"Invalid input: {name} must be non-negative."
return num
# Validate time
time_arr = validate_numeric_2d(time, "time", allow_negative=False)
if isinstance(time_arr, str):
return time_arr
if time_arr.ndim != 2 or time_arr.shape[1] != 1:
return "Invalid input: time must be a column vector (2D list with one column)."
time_vec = time_arr.flatten()
# Validate event
event_arr = validate_numeric_2d(event, "event", allow_negative=False)
if isinstance(event_arr, str):
return event_arr
if event_arr.ndim != 2 or event_arr.shape[1] != 1:
return "Invalid input: event must be a column vector (2D list with one column)."
event_vec = event_arr.flatten()
# Check time and event have same length
if len(time_vec) != len(event_vec):
return "Invalid input: time and event must have the same length."
# Validate event values are 0 or 1
if not all((e == 0 or e == 1) for e in event_vec):
return "Invalid input: event values must be 0 (censored) or 1 (event occurred)."
# Validate x
x_arr = validate_numeric_2d(x, "x")
if isinstance(x_arr, str):
return x_arr
if x_arr.ndim != 2:
return "Invalid input: x must be a 2D matrix."
if x_arr.shape[0] != len(time_vec):
return "Invalid input: x must have the same number of rows as time and event."
# Validate fit_intercept
if not isinstance(fit_intercept, bool):
return "Invalid input: fit_intercept must be a boolean."
# Validate alpha
alpha_val = validate_scalar_numeric(alpha, "alpha", allow_negative=False)
if isinstance(alpha_val, str):
return alpha_val
if alpha_val <= 0 or alpha_val >= 1:
return "Invalid input: alpha must be between 0 and 1."
# Prepare data for statsmodels
n_obs = len(time_vec)
n_covariates = x_arr.shape[1]
# Create column names for covariates
covariate_names = [f"x{i+1}" for i in range(n_covariates)]
# Create DataFrame with covariates
data_dict = {name: x_arr[:, i] for i, name in enumerate(covariate_names)}
data_dict["time"] = time_vec
data_dict["event"] = event_vec
df = pd.DataFrame(data_dict)
# Add intercept if requested
if fit_intercept:
df["intercept"] = 1.0
covariate_names = ["intercept"] + covariate_names
# Fit Cox model
try:
# PHReg expects time as endog and status as a separate parameter
endog = df["time"]
exog = df[covariate_names]
status = df["event"]
model = statsmodels_phreg(endog, exog, status=status)
result = model.fit()
except Exception as exc:
return f"statsmodels.duration.hazard_regression.PHReg error: {exc}"
# Extract results
try:
# Result attributes may be Series or arrays depending on statsmodels version
coefficients = result.params if isinstance(result.params, np.ndarray) else result.params.values
std_errors = result.bse if isinstance(result.bse, np.ndarray) else result.bse.values
z_statistics = result.tvalues if isinstance(result.tvalues, np.ndarray) else result.tvalues.values
p_values = result.pvalues if isinstance(result.pvalues, np.ndarray) else result.pvalues.values
conf_int = result.conf_int(alpha=alpha_val)
if isinstance(conf_int, np.ndarray):
ci_lower = conf_int[:, 0]
ci_upper = conf_int[:, 1]
else:
ci_lower = conf_int.iloc[:, 0].values
ci_upper = conf_int.iloc[:, 1].values
hazard_ratios = np.exp(coefficients)
# Model statistics
log_likelihood = result.llf
except Exception as exc:
return f"Error extracting results: {exc}"
# Build output table
output = [["parameter", "coefficient", "std_error", "z_statistic", "p_value", "ci_lower", "ci_upper", "hazard_ratio"]]
for i, param_name in enumerate(covariate_names):
output.append([
param_name,
float(coefficients[i]),
float(std_errors[i]),
float(z_statistics[i]),
float(p_values[i]),
float(ci_lower[i]),
float(ci_upper[i]),
float(hazard_ratios[i])
])
# Add model statistics
output.append(["log_likelihood", float(log_likelihood), "", "", "", "", "", ""])
return output